Bayes and Big Data: The Consensus Monte Carlo Algorithm

نویسندگان

  • Steven L. Scott
  • Alexander W. Blocker
  • Fernando V. Bonassi
  • Hugh A. Chipman
  • Edward I. George
  • Robert E. McCulloch
چکیده

A useful definition of “big data” is data that is too big to comfortably process on a single machine, either because of processor, memory, or disk bottlenecks. Graphics processing units can alleviate the processor bottleneck, but memory or disk bottlenecks can only be eliminated by splitting data across multiple machines. Communication between large numbers of machines is expensive (regardless of the amount of data being communicated), so there is a need for algorithms that perform distributed approximate Bayesian analyses with minimal communication. Consensus Monte Carlo operates by running a separate Monte Carlo algorithm on each machine, and then averaging individual Monte Carlo draws across machines. Depending on the model, the resulting draws can be nearly indistinguishable from the draws that would have been obtained by running a single machine algorithm for a very long time. Examples of consensus Monte Carlo are shown for simple models where single-machine solutions are available, for large single-layer hierarchical models, and for Bayesian additive regression trees (BART).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimation for the Type-II Extreme Value Distribution Based on Progressive Type-II Censoring

In this paper, we discuss the statistical inference on the unknown parameters and reliability function of type-II extreme value (EVII) distribution when the observed data are progressively type-II censored. By applying EM algorithm, we obtain maximum likelihood estimates (MLEs). We also suggest approximate maximum likelihood estimators (AMLEs), which have explicit expressions. We provide Bayes ...

متن کامل

Variational Consensus Monte Carlo

Practitioners of Bayesian statistics have long depended on Markov chain Monte Carlo (MCMC) to obtain samples from intractable posterior distributions. Unfortunately, MCMC algorithms are typically serial, and do not scale to the large datasets typical of modern machine learning. The recently proposed consensus Monte Carlo algorithm removes this limitation by partitioning the data and drawing sam...

متن کامل

Approximating Bayes Estimates by Means of the Tierney Kadane, Importance Sampling and Metropolis-Hastings within Gibbs Methods in the Poisson-Exponential Distribution: A Comparative Study

Here, we work on the problem of point estimation of the parameters of the Poisson-exponential distribution through the Bayesian and maximum likelihood methods based on complete samples. The point Bayes estimates under the symmetric squared error loss (SEL) function are approximated using three methods, namely the Tierney Kadane approximation method, the importance sampling method and the Metrop...

متن کامل

Importance Weighted Consensus Monte Carlo for Distributed Bayesian Inference

The recent explosion in big data has created a significant challenge for efficient and scalable Bayesian inference. In this paper, we consider a divide-and-conquer setting in which the data is partitioned into different subsets with communication constraints, and a proper combination strategy is used to aggregate the Monte Carlo samples drawn from the local posteriors based on the dataset subse...

متن کامل

Unbiased Bayes for Big Data: Paths of Partial Posteriors

A key quantity of interest in Bayesian inference are expectations of functions with respect to a posterior distribution. Markov Chain Monte Carlo is a fundamental tool to consistently compute these expectations via averaging samples drawn from an approximate posterior. However, its feasibility is being challenged in the era of so called Big Data as all data needs to be processed in every iterat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013